智能论文笔记

Meta-Generalization for Multiparty Privacy Learning to Identify Anomaly Multimedia Traffic in Graynet

Satoshi Kamo , Yiqiang Sheng

分类：人工智能 | 机器学习 | 神经与进化计算

2022-01-09

识别网络空间中的异常多媒体流量是分布式服务系统，多代网络和未来所有互联网的大挑战。这封信探讨了Graynet中的多方隐私学习模型的元概括，以提高异常多媒体流量识别的性能。 Graynet中的MultiParty Privacy学习模型是通过交换保留私有数据的多群参数更新来划分，分布和培训的全局共享模型。元概述是指发现学习模型的固有属性，以减少其泛化误差。在实验中，如下测试了三个元概括原理。通过更改字节级嵌入的维度，减少了磨略中的多派隐私学习模型的泛化误差。在此之后，通过调整深度来减少错误以提取分组级别功能。最后，通过调整用于预处理流量级数据的支持集的大小来减少错误。实验结果表明，该提议优于识别异常多媒体流量的最先进的学习模型。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Objective Surgical Skills Assessment and Tool Localization: Results from the MICCAI 2021 SimSurgSkill Challenge

Aneeq Zia , Kiran Bhattacharyya , Xi Liu , Ziheng Wang , Max Berniker , Satoshi Kondo , Emanuele Colleoni , Dimitris Psychogyios , Yueming Jin , Jinfan Zhou

分类：计算机视觉

2022-12-08

Timely and effective feedback within surgical training plays a critical role in developing the skills required to perform safe and efficient surgery. Feedback from expert surgeons, while especially valuable in this regard, is challenging to acquire due to their typically busy schedules, and may be subject to biases. Formal assessment procedures like OSATS and GEARS attempt to provide objective measures of skill, but remain time-consuming. With advances in machine learning there is an opportunity for fast and objective automated feedback on technical skills. The SimSurgSkill 2021 challenge (hosted as a sub-challenge of EndoVis at MICCAI 2021) aimed to promote and foster work in this endeavor. Using virtual reality (VR) surgical tasks, competitors were tasked with localizing instruments and predicting surgical skill. Here we summarize the winning approaches and how they performed. Using this publicly available dataset and results as a springboard, future work may enable more efficient training of surgeons with advances in surgical data science. The dataset can be accessed from https://console.cloud.google.com/storage/browser/isi-simsurgskill-2021.

translated by 谷歌翻译

Non-uniform Sampling Strategies for NeRF on 360{\textdegree} images

Takashi Otonari , Satoshi Ikehata , Kiyoharu Aizawa

分类：计算机视觉

2022-12-07

In recent years, the performance of novel view synthesis using perspective images has dramatically improved with the advent of neural radiance fields (NeRF). This study proposes two novel techniques that effectively build NeRF for 360{\textdegree} omnidirectional images. Due to the characteristics of a 360{\textdegree} image of ERP format that has spatial distortion in their high latitude regions and a 360{\textdegree} wide viewing angle, NeRF's general ray sampling strategy is ineffective. Hence, the view synthesis accuracy of NeRF is limited and learning is not efficient. We propose two non-uniform ray sampling schemes for NeRF to suit 360{\textdegree} images - distortion-aware ray sampling and content-aware ray sampling. We created an evaluation dataset Synth360 using Replica and SceneCity models of indoor and outdoor scenes, respectively. In experiments, we show that our proposal successfully builds 360{\textdegree} image NeRF in terms of both accuracy and efficiency. The proposal is widely applicable to advanced variants of NeRF. DietNeRF, AugNeRF, and NeRF++ combined with the proposed techniques further improve the performance. Moreover, we show that our proposed method enhances the quality of real-world scenes in 360{\textdegree} images. Synth360: https://drive.google.com/drive/folders/1suL9B7DO2no21ggiIHkH3JF3OecasQLb.

translated by 谷歌翻译

P2Net: A Post-Processing Network for Refining Semantic Segmentation of LiDAR Point Cloud based on Consistency of Consecutive Frames

Yutaka Momma , Weimin Wang , Edgar Simo-Serra , Satoshi Iizuka , Ryosuke Nakamura , Hiroshi Ishikawa

分类：计算机视觉 | 机器人

2022-12-01

We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine these results predicted by an existing segmentation method. The network, which we call the P2Net, learns the consistency constraints between coincident points from consecutive frames after registration. We evaluate the proposed post-processing method both qualitatively and quantitatively on the SemanticKITTI dataset that consists of real outdoor scenes. The effectiveness of the proposed method is validated by comparing the results predicted by two representative networks with and without the refinement by the post-processing network. Specifically, qualitative visualization validates the key idea that labels of the points that are difficult to predict can be corrected with P2Net. Quantitatively, overall mIoU is improved from 10.5% to 11.7% for PointNet [1] and from 10.8% to 15.9% for PointNet++ [2].

translated by 谷歌翻译

Instance-level Heterogeneous Domain Adaptation for Limited-labeled Sketch-to-Photo Retrieval

Fan Yang , Yang Wu , Zheng Wang , Xiang Li , Sakriani Sakti , Satoshi Nakamura

分类：计算机视觉 | 机器学习

2022-11-26

Although sketch-to-photo retrieval has a wide range of applications, it is costly to obtain paired and rich-labeled ground truth. Differently, photo retrieval data is easier to acquire. Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i.e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i.e., target domain). However, without co-training source and target data, source domain knowledge might be forgotten during the fine-tuning process, while simply co-training them may cause negative transfer due to domain gaps. Moreover, identity label spaces of source data and target data are generally disjoint and therefore conventional category-level Domain Adaptation (DA) is not directly applicable. To address these issues, we propose an Instance-level Heterogeneous Domain Adaptation (IHDA) framework. We apply the fine-tuning strategy for identity label learning, aiming to transfer the instance-level knowledge in an inductive transfer manner. Meanwhile, labeled attributes from the source data are selected to form a shared label space for source and target domains. Guided by shared attributes, DA is utilized to bridge cross-dataset domain gaps and heterogeneous domain gaps, which transfers instance-level knowledge in a transductive transfer manner. Experiments show that our method has set a new state of the art on three sketch-to-photo image retrieval benchmarks without extra annotations, which opens the door to train more effective models on limited-labeled heterogeneous image retrieval tasks. Related codes are available at https://github.com/fandulu/IHDA.

translated by 谷歌翻译

Deep generative model super-resolves spatially correlated multiregional climate data

Norihiro Oyama , Noriko N. Ishizaki , Satoshi Koide , Hiroaki Yoshida

分类：机器学习

2022-09-26

超级解决全球气候模拟的粗略产出，称为缩减，对于需要长期气候变化预测的系统做出政治和社会决策至关重要。但是，现有的快速超分辨率技术尚未保留气候数据的空间相关性，这在我们以空间扩展（例如运输基础设施的开发）处理系统时尤其重要。本文中，我们展示了基于对抗性的网络的机器学习，使我们能够在降尺度中正确重建区域间空间相关性，并高达五十，同时保持像素统计的一致性。与测量的温度和降水分布的气象数据的直接比较表明，整合气候上重要的物理信息对于准确的缩减至关重要，这促使我们称我们的方法称为$ \ pi $ srgan（物理学知情的超级分辨率生成生成的对手网络）。本方法对气候变化影响的区域间一致评估具有潜在的应用。

translated by 谷歌翻译

Saliency-based Multiple Region of Interest Detection from a Single 360° image

Yuuki Sawabe , Satoshi Ikehata , Kiyoharu Aizawa

分类：计算机视觉

2022-09-08

360 {\ deg}图像是有益的 - 它包含相机周围的全向视觉信息。但是，覆盖360 {\ deg}图像的区域比人类的视野大得多，因此在不同视图方向上的重要信息很容易被忽略。为了解决此问题，我们提出了一种使用视觉显着性作为线索来预测单个360 {\ deg}图像中最佳区域（ROI）集合的方法。为了处理现有的单个360 {\ deg}图像显着性预测数据集的稀缺，有偏见的训练数据，我们还提出了基于球形随机数据旋转的数据增强方法。从预测的显着图和冗余候选区域，我们获得了最佳的ROI集合，考虑到区域内的显着性和区域之间的相互作用（IOU）。我们进行主观评估，以表明所提出的方法可以选择正确汇总输入360 {\ deg}图像的区域。

translated by 谷歌翻译

Source-Free Unsupervised Domain Adaptation with Norm and Shape Constraints for Medical Image Segmentation

Satoshi Kondo

分类：计算机视觉

2022-09-03

无监督的域适应性（UDA）是解决一个问题的关键技术之一，很难获得监督学习所需的地面真相标签。通常，UDA假设在培训过程中可以使用来自源和目标域中的所有样本。但是，在涉及数据隐私问题的应用下，这不是现实的假设。为了克服这一限制，最近提出了无源数据的UDA，即无源无监督的域适应性（SFUDA）。在这里，我们提出了一种用于医疗图像分割的SFUDA方法。除了在UDA中通常使用的熵最小化方法外，我们还引入了一个损失函数，以避免目标域中的特征规范和在保留目标器官的形状约束之前。我们使用数据集进行实验，包括多种类型的源目标域组合，以显示我们方法的多功能性和鲁棒性。我们确认我们的方法优于所有数据集中的最先进。

translated by 谷歌翻译

Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos

Fan Yang , Norimichi Ukita , Sakriani Sakti , Satoshi Nakamura

分类：计算机视觉

2022-08-27

深度学习视频动作识别（AR）的成功促使研究人员逐步将相关任务从粗糙级别促进到细粒度水平。与仅预测整个视频的动作标签的常规AR相比，已经研究了时间动作检测（TAD），以估算视频中每个动作的开始和结束时间。将TAD进一步迈进，已经研究了时空动作检测（SAD），用于在视频中在空间和时间上定位该动作。但是，执行动作的人通常在SAD中被忽略，同时识别演员也很重要。为此，我们提出了一项新的任务，即演员识别的时空动作检测（ASAD），以弥合SAD和Actor识别之间的差距。在ASAD中，我们不仅检测到实例级别的动作的时空边界，还为每个参与者分配了唯一的ID。要接近ASAD，多个对象跟踪（MOT）和动作分类（AC）是两个基本要素。通过使用MOT，获得了每个参与者的时空边界，并分配给独特的演员身份。通过使用AC，在相应的时空边界内估计了动作类别。由于ASAD是一项新任务，因此它提出了许多新的挑战，这些挑战无法通过现有方法解决：i）没有专门为ASAD创建数据集，ii）ii）ii）没有为ASAD设计评估指标，iii）当前的MOT性能是获得的瓶颈令人满意的ASAD结果。为了解决这些问题，我们为i）注释一个新的ASAD数据集，ii）提出ASAD评估指标，通过考虑多标签动作和参与者的识别，iii）提高数据关联策略以提高MOT性能，从而提高MOT性能更好的ASAD结果。该代码可在\ url {https://github.com/fandulu/asad}中获得。

translated by 谷歌翻译